Skip to content

Fix direct transpose output offset#146

Merged
romerojosh merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/direct-transpose-offset
Jun 24, 2026
Merged

Fix direct transpose output offset#146
romerojosh merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/direct-transpose-offset

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • Use the output-side shift_b accumulator when computing the direct-transpose destination offset.
  • Add multi-rank MPI_P2P_PL coverage for the direct-transpose pack-offset path with halo and padding enabled.

Notes

  • The requested nontrivial offsets_a != offsets_b public regression is not currently reachable through this branch because direct_transpose is selected only when the communication split count is one. The added test keeps the currently reachable path covered on multiple ranks with output halo/padding.

Validation

  • git diff --check
  • cmake -S . -B build -DCUDECOMP_BUILD_TESTS=ON -DCUDECOMP_TEST_FETCH_GTEST=ON stops during configure because nvc++ is not available in this environment.
  • clang-format is not available in this environment.

@romerojosh romerojosh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fallintoplace, thanks for the contribution. This shift_b typo is something I've been aware of for a while but as noted by your PR comment, it doesn't manifest in any runtime issues since this path it only ever exercised with single rank row/column communicators where the shift/shift_b variables are initialized to an offset of 0. Happy to accept the fix once you remove the additional test.

Comment thread tests/ctest/transpose_tests.cc Outdated
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@fallintoplace fallintoplace force-pushed the fix/direct-transpose-offset branch from 42ca482 to 6d314ce Compare June 24, 2026 16:50
@romerojosh

Copy link
Copy Markdown
Collaborator

/build

@github-actions

Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions

Copy link
Copy Markdown

✅ Build workflow passed! View run

@romerojosh romerojosh merged commit cd17e1f into NVIDIA:main Jun 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants